Multi-objective Discounted Reward Verification in Graphs and MDPs

نویسندگان

  • Krishnendu Chatterjee
  • Vojtech Forejt
  • Dominik Wojtczak
چکیده

We study the problem of achieving a given value in Markov decision processes (MDPs) with several independent discounted reward objectives. We consider a generalised version of discounted reward objectives, in which the amount of discounting depends on the states visited and on the objective. This definition extends the usual definition of discounted reward, and allows to capture the systems in which the value of different commodities diminish at different and variable rates. We establish results for two prominent subclasses of the problem, namely state-discount models where the discount factors are only dependent on the state of the MDP (and independent of the objective), and rewarddiscount models where they are only dependent on the objective (but not on the state of the MDP). For the state-discount models we use a straightforward reduction to expected total reward and show that the problem whether a value is achievable can be solved in polynomial time. For the reward-discount model we show that memory and randomisation of the strategies are required, but nevertheless that the problem is decidable and it is sufficient to consider strategies which after a certain number of steps behave in a memoryless way. For the general case, we show that when restricted to graphs (i.e. MDPs with no randomisation), pure strategies and discount factors of the form 1/n where n is an integer, the problem is in PSPACE and finite memory suffices for achieving a given value. We also show that when the discount factors are not of the form 1/n, the memory required by a strategy can be infinite.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reduction of Discounted Continuous-Time MDPs with Unbounded Jump and Reward Rates to Discrete-Time Total-Reward MDPs

This article discusses a reduction of discounted Continuous-Time Markov Decision Processes (CTMDPs) to discrete-time Markov Decision Processes (MDPs). This reduction is based on the equivalence of a randomized policy that chooses actions only at jump epochs to a nonrandomized policy that can switch actions between jumps. For discounted CTMDPs with bounded jump rates, this reduction was introduc...

متن کامل

Multi-Objective Approaches to Markov Decision Processes with Uncertain Transition Parameters

Markov decision processes (MDPs) are a popular model for performance analysis and optimization of stochastic systems. The parameters of stochastic behavior of MDPs are estimates from empirical observations of a system; their values are not known precisely. Different types of MDPs with uncertain, imprecise or bounded transition rates or probabilities and rewards exist in the literature. Commonly...

متن کامل

Total Expected Discounted Reward MDPs: Existence of Optimal Policies

This article describes the results on the existence of optimal and nearly optimal policies for Markov Decision Processes (MDPs) with total expected discounted rewards. The problem of optimization of total expected discounted rewards for MDPs is also known under the name of discounted dynamic programming.

متن کامل

Accelerated decomposition techniques for large discounted Markov decision processes

Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...

متن کامل

Multi-Criteria Approaches to Markov Decision Processes with Uncertain Transition Parameters

Markov decision processes (MDPs) are a well established model for planing under uncertainty. In most situations the MDP parameters are estimates from real observations such that their values are not known precisely. Different types of MDPs with uncertain, imprecise or bounded transition rates or probabilities and rewards exist in the literature. Commonly the resulting processes are optimized wi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013